WHAT IS CLAIMED IS: 



1 1 . A computerized pronunciation system configured to generate 

2 pronunciations for words that are represented by waveforms and text, such that the 

3 pronunciations are spelled by phones in a phonetic alphabet for storage in a pronunciation 

4 dictionary, the system comprising: 

5 a word list including at least one word; 

6 transcribed acoustic data including at least one waveform for the word and 

7 transcribed text associated with the waveform; 

8 a pronunciation-learning module configured to accept as input the word list 

9 and the transcribed acoustic data, the pronunciation-learning module including: 

1 0 sets of initial pronunciations of the word, 

1 1 a scoring module configured score pronunciations and to generate 

1 2 phone probabilities, and 

13 a set of alternate pronunciations of the word, wherein the set of 

1 4 alternate pronunciations include a highest-scoring set of initial pronunciations with a 

15 highest-scoring substitute phone substituted for a lowest-probability phone; and 

1 6 a pronunciation dictionary configured to receive the highest-scoring set of 

1 7 initial pronunciations and the set of alternate pronunciations. 

1 2. The system of claim 1, wherein the transcribed acoustic data includes 

2 a plurality of waveforms for the word, and 

3 transcribed text for each waveform of the plurality of waveforms. 

1 3. The system of claim 2, wherein the plurality of waveforms are acoustic 

2 representations of the word spoken by a plurality of speakers. 

1 4. The system of claim 1, wherein the word list includes a plurality of 

2 words. 

1 5. The system of claim 4, wherein the transcribed acoustic data includes 

2 a plurality of waveforms for the plurality of words, and 

3 transcribed text for each waveform of the plurality of waveforms 
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1 6. The system of claim 5, wherein the waveforms of the plurality of 

2 waveforms are acoustic representations of the plurality of words spoken by a plurality of 

3 speakers. 

1 7. The system of claim 1, wherein the pronunciation-learning module is 

2 further configured to: 

3 force-align the sets of initial pronunciations to the waveform; thereafter 

4 generate the set of alternate pronunciations; and 

5 add the set of alternate pronunciations to the pronunciation dictionary. 

1 8. The system of claim 7, wherein the scoring module is configured to 

2 score the sets of initial pronunciations. 

1 9. The system of claim 8, wherein the scoring module is configured to 

2 generate a phone probability for each phone in a highest-scoring set of initial pronunciations 

3 and for each substitute phone in a set of substitute phones. 

1 10. The system of claim 1 , wherein the phone probabilities are posterior 

2 probabilities. 

1 11. The system of claim 1 , further comprising a letter-to-phone engine 

2 configured to generate initial pronunciations from which the sets of initial pronunciations are 

3 generated. 

1 12. The system of claim 1, wherein initial pronunciations from which the 

2 sets of initial pronunciation are generated are extracted from the pronunciation dictionary. 

1 13. The system of claim 1, where in the scoring module includes an 

2 automatic speech recognition (ASR) system configured to score the sets of initial 

3 pronunciations. 

1 14. The system of claim 13, wherein the pronunciation-learning module is 

2 further configured graph the sets of initial pronunciations, and the ASR system is configured 

3 to score graphed sets of initial pronunciations. 
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1 15. The system of claim 1 3, wherein the ASR system is further configured 

2 to generate transcriptions of acoustic data spoken by a plurality of speakers, and wherein the 

3 transcriptions are included in the transcribed acoustic data. 

1 16. The system of claim 15, wherein the ASR system is further configured 

2 to collect feedback from the plurality of speakers to affirm correct recognition by the ASR 

3 system, and if recognition is correct, enter the transcribed words in the transcribed acoustic 

4 data. 

1 17. A computerized pronunciation system configured to generate 

2 pronunciations for words that are represented by waveforms and text, such that the 

3 pronunciations are spelled by phones in a phonetic alphabet for storage in a pronunciation 

4 dictionary, the system comprising: 

5 a word list including at least one word; 

6 transcribed acoustic data including at least one waveform for the word and 

7 transcribed text associated with the waveform; 

8 a pronunciation-learning module configured to accept as input the word list 

9 and the transcribed acoustic data, the pronunciation-learning module including: 

1 0 sets of initial pronunciations of the word, 

1 1 an automatic speech recognition (ASR) system configured to score 

1 2 pronunciations, 

13 a scoring module configured to generate phone probabilities, and 

14 a set of alternate pronunciations of the word, wherein the set of 

1 5 alternate pronunciations include a highest-scoring set of initial pronunciations with a 

16 highest-scoring substitute phone substituted for a lowest-probability phone; and 

1 7 a pronunciation dictionary configured to receive the highest-scoring initial 

18 pronunciation and a highest-scoring set of alternate pronunciations. 

1 18. The system of claim 17, wherein the word list includes a plurality of 

2 words. 

1 19. The system of claim 18, wherein the transcribed acoustic data includes 

2 a plurality of waveforms and transcribed text for the plurality of words. 
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1 20. The system of claim 19, wherein the waveforms of the plurality of 

2 waveforms are acoustic representations of the plurality of words spoken by a plurality of 

3 speakers. 

1 21. The system of claim 1 7, further comprising a letter-to-phone engine 

2 configured to generate initial pronunciations from which the sets of initial pronunciations are 

3 generated. 

1 22. The system of claim 17, wherein initial pronunciations from which the 

2 sets of initial pronunciation are generated are extracted from the pronunciation dictionary. 

1 23. The system of claim 17, wherein the ASR system is configured to 

2 score graphed sets of initial pronunciations. 

1 24. The system of claim 17, wherein the ASR system is configured to 

2 generate transcriptions of acoustic data spoken by a plurality of speakers, wherein the 

3 transcriptions are included in the transcribed acoustic data. 

1 25. The system of claim 24, wherein the ASR system is further configured 

2 to collect feedback from the plurality of speakers that the transcriptions generated by the ASR 

3 system are words spoken by the plurality of speakers, and wherein if the collected feedback 

4 affirms correct recognition by the ASR system, the transcriptions are entered in the 

5 pronunciation dictionary. 

1 26. A computerized pronunciation system configured to generate 

2 pronunciations for words that are represented by waveforms and text, such that the 

3 pronunciations are spelled by phones in a phonetic alphabet for storage in a pronunciation 

4 dictionary, the system comprising: 

5 a word list including a plurality of words; 

6 transcribed acoustic data including a set of waveforms for each of the words 

7 and a set of transcribed text corresponding to the waveforms; 

8 a pronunciation-learning module configured to accept as input the word list 

9 and the transcribed acoustic data, the pronunciation-learning module including: 
1 0 sets of initial pronunciations of the plurality of words, 
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1 1 sets of alternate pronunciations of the plurality of words, wherein each 

1 2 set of alternate pronunciations includes a highest-scoring set of initial pronunciations 

1 3 with a unique substitute phone substituted for a lowest-probability phone of the 

14 highest-scoring set of initial pronunciations; 

1 5 a scoring module configured score the sets of initial and alternate 

1 6 pronunciations and to generate phone probabilities; and 

1 7 a pronunciation dictionary configured to receive the highest-scoring initial 

1 8 pronunciation and a highest-scoring set of alternate pronunciations. 

1 27. The system of claim 26, wherein the sets of alternate pronunciations 

2 further include a set of alternate pronunciations that include the highest-scoring initial 

3 pronunciation with the lowest-probability phone removed. 

1 28. The system of claim 26, wherein the sets of alternate pronunciations 

2 further include additional sets of alternate pronunciations that include the highest-scoring 

3 initial pronunciation having a unique phone inserted adjacent to the lowest-probability phone. 

1 29. The system of claim 26, wherein the sets of alternate pronunciations 

2 further include additional sets of alternate pronunciations that include the highest-scoring 

3 initial pronunciation having a sequence of two phones substituted for the lowest-probability 

4 phone. 

1 30. The system of claim 26, wherein the sets of alternate pronunciations 

2 further include additional sets of alternate pronunciations that include the highest-scoring 

3 initial pronunciation having the lowest-probability phone and a right neighboring phone 

4 substituted with a unique phone. 

1 31. The system of claim 26, wherein the sets of alternate pronunciations 

2 further include additional sets of alternate pronunciations that include the highest-scoring 

3 initial pronunciation with the lowest-probability phone and a left neighboring phone 

4 substituted with a unique phone. 
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